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Abstract 

This paper presents GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algo- 
rithm), an efficient and robust onUne algorithm for tracking subspaces from highly incomplete 
information. The algorithm uses a robust l^-noim. cost function in order to estimate and track 
non-stationary subspaces when the streaming data vectors are corrupted with outliers. We apply 
GRASTA to the problems of robust matrix completion and real-time separation of background 
from foreground in video. In this second application, we show that GRASTA performs high- 
quality separation of moving objects from background at exceptional speeds: In one popular 
benchmark video example [28], GRASTA achieves a rate of 57 frames per second, even when 
run in MATLAB on a personal laptop. 

Keywords: Grassman manifold, Lagrangian alternating direction, subspace tracking, matrix 
separation, robust PCA, video surveillance. 

1 Introduction 

Low-rank subspaces have long been a powerful tool in data modeling and analysis. Applications in 
communications [35], source localization and target tracking in radar and sonar [24], and medical 
imaging [2] all leverage subspace models in order to recover the signal of interest and reject noise. 
In these classical signal processing problems, a handful of high-quality sensors are co-located such 
that data can be reliably collected. 

The challenges of modern data analysis breach this standard setup. A first difference, one 
that cannot be overstated, is that data are being collected everywhere, on a more massive scale 
than ever before, by cameras, sensors, and people. We give just a few examples: There are an 
estimated minimum 10,000 surveillance cameras in the city of Chicago and an estimated 500,000 



1 



in London [3,33]. Nctflix collects ratings from 25 million users on tens of thousands of movies [13]. 
On its peak day of the lioliday season in 2008, Amazon.com collected data on 72 items purchased 
every second [44]. The Large Synoptic Survey Telescope, which will be deployed in Chile and will 
photograph the whole sky visible to it every three nights, will produce 20 terabytes of data every 
night [39]. 

A second and equally important difference is that, in all these examples mentioned, the data 
collected may be unreliable or an indirect indicator of what one really wants to know. The data 
are collected from many possibly distributed sensors or even from people whose responses may be 
inconsistent, and the data may be missing or corrupted. 

In order to address these issues, algorithms for data analysis must be computationally fast as 
well as robust to corruption and missing data. In this paper we present the Grassmannian Robust 
Adaptive Subspace Tracking Algorithm, or GRASTA, an online algorithm for robust subspace 
tracking that handles these three challenges at once. We seek a low-rank model for data that may 
be corrupted by outliers and have missing data values. 

GRASTA uses the natural i^-norm cost function for data corrupted by sparse outliers, and 
performs incremental gradient descent on the Grassmannian, the manifold of all d-dimensional 
subspaces for fixed d. For each subspace update, we use the gradient of the augmented Lagrangian 
function associated to this cost. GRASTA operates only one data vector at a time, making it faster 
than other state-of-the-art algorithms and amenable to streaming and real-time applications. 

1.1 Contributions 

The contributions of our work are threefold: 

1.1.1 Efficient Grassmannian Augmented Lagrangian Alternating Direction Algo- 
rithm 

We propose an efficient online robust subspace tracking algorithm - GRASTA, or Grassmannian 
Robust Adaptive Subspace Tracking Algorithm - which combines the augmented Lagrangian func- 
tion with the classic stochastic gradient framework [25] and the structure of the Grassmannian [18], 
and solves via the augmented Lagrangian alternating direction method [7] . As we discuss in detail 
in Section 2 and 3, GRASTA alternates between estimating a low-dimensional subspace S and a 
triple (s, w, y) which represent the sparse corruptions in the signal, the weights for the fit of the 
signal to the subspace, and the dual vector in the optimization problem. For estimating the sub- 
space S, GRASTA uses gradient descent on the Grassmannian with (s, w, y) fixed; for estimating 
the triple {s,w,y), GRASTA uses ADMM [7]. 

When data vectors arise from an underlying subspace which is inherently low-dimensional, and 
are corrupted with noise and outliers, GRASTA is able estimate and track the subspace successfully, 
even when the vectors are highly incomplete. 

1.1.2 Fast Robust Low-rank Matrix Completion 

We show that GRASTA can successfully recover a low-rank matrix from partial information, even 
if the partially observed entries are corrupted by gross outliers. GRASTA's incremental update 
results in a significant speed-up over other state-of-the-art robust matrix completion algorithms or 
RPCA (robust principal components analysis) algorithms. 
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1.1.3 Realtime Separation of Background and Moving Objects in Video Surveillance 

Finally, we show that the online nature of GRASTA makes it suitable for realtime high-dimensional 

sparse signal separation from a background signal, such as the task of separating background 
and moving objects in video surveillance. Compared to other RPCA methods, GRASTA can 
handle video frames at very high rates- up to 57 frames per second in our examples- even when 
implemented in MATLAB on a personal laptop, which is a significant practical advantage over 
other state-of-the-art techniques. 

This paper is organized as follows. We motivate robust online subspace tracking and give 
background on subspace tracking and matrix completion in Sections 1.2 and 1.3. The familiar 
reader can go directly to Section 2, where we formulate the robust subspace tracking problem and 
introduce the novel subspace error function in Section 2. In Section 3, we present the Grassmannian 
Robust Adaptive Subspace Tracking Algorithm (GRASTA) in detail and discuss critical parts of the 
implementation; we point out the limitations and merits as compared with other RPCA algorithms. 
In Section 4, we compare GRASTA with GROUSE and RPCA algorithms via extensive numerical 
experiments and several real-world video surveillance experiments. Section 5 concludes our work 
and gives some discussion on future directions. 

1.2 Motivations 

1.2.1 Online Subspace Tracking within Outliers 

GRASTA is built on GROUSE [4], an efficient online subspace tracking algorithm. GROUSE uses 
an Z^-norm cost function, which is problematic when facing data corruption or noise distributed 
other than Gaussian. 

As an example, we consider using subspaccs to detect anomalies in computer networks [26]. A 
non-robust subspace estimation algorithm like GROUSE would need a special anomaly detection 
component in order to differentiate anomalies and outliers from the underlying subspace of the 
traffic data. Often these types of anomaly detection components rely on a lot of parameter tuning 
and heuristic rules for detection. This motivates a more principled approach that is robust by 
design: GRASTA. 

1.2.2 Robust Principal Component Analysis 

Principal Components Analysis [21] is a critical tool for data analysis in many fields. Given a 
parameter d for the number of components desired, PCA seeks to find the best-fit (in an Z^ norm 
sense) d-dimensional subspace to data; in other words, it finds the best d vectors, the principal 
components, such that the data can be approximated by a linear combination of those d vectors. 

The residuals of an Z^-norm error function will be Gaussian distributed. Therefore, even with 
one outlier data point, the principal components can be arbitrarily far from those without the 
outlier data point [20]. Modern data applications- such as those in sensor networks, collaborative 
filtering, video surveillance or the network monitoring example just given- will all experience data 
failures that result in outliers. Sometimes the outliers are even the signal of interest, as in the 
case of network anomaly detection or identifying moving objects in the foreground of a surveillance 
camera. 
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A good deal of research is therefore focused on Robust PCA, including [11,14]. Recent work 
focuses on a problem definition which seeks a low-rank and sparse matrix whose sum is the observed 
data. The majority of algorithms use SVD (singular value decomposition) computations to perform 
Robust PCA. The SVD is too slow for many real-time applications, and consequently many online 
SVD and subspace identification algorithms have been developed, as we discuss in Section 1.3.1. 
We are therefore motivated to bridge the gap between online algorithms and robust algorithms 
with GRASTA. 

Of course we emphasize that besides the ability to do matrix separation into low-rank and sparse 
parts, GRASTA can also effectively handle the scenario where the low-rank subspace is dynamic. 

1.3 Background 

1.3.1 Subspace Tracking 

First we briefly describe the subspace tracking problem set-up and GROUSE [4] algorithm before 
reviewing previous literature on subspace tracking. 

Consider a sequence of d-dimensional subspaces St C M", d < n, and a sequence of vectors 
vt G Sf. The object of a subspace tracking algorithm is to estimate St, given only vt and the 
previous subspace estimate St-i- 

Incomplete Data Vectors Now considering the issue of incomplete data vectors, the object of 

an algorithm for subspace tracking with missing data is to estimate St given vq^- an incomplete 
version of vt, observed only on the indices Q, C {1, . . . ,n}. The GROUSE [4] algorithm addresses 
exactly this problem. GROUSE is an incremental gradient descent algorithm performed on the 
Grassmannian Q(d,n), the space of all d-dimensional subspaces of M". The algorithm minimizes 
an P-norm cost between observed incomplete vectors and their fit to the subspace variable. Each 
step of the algorithm is simple and requires very few operations. However, the use of the l"^ loss 
makes GROUSE very susceptible to outliers. 

Complete Data Vectors Comon and Golub [15] give an early survey of adaptive methods for 
tracking subspaces, both coming from the matrix computation literature, including Lanczos-based 
recursion algorithms, and gradient-based methods from the signal processing literature. 

There is a vast literature on the adaptation of QR and SVD factorizations to the adaptive, online 
context. The work in [6,34,43] are all along these lines. The fastest algorithm for incremental SVD 
is given in [9]; this algorithm makes modifications, one column at a time, to the thin SVD of a 
strictly rank-d n x n matrix in 0{'n?d) time. 

Initial work in signal processing for subspace tracking was aimed at estimating from data the 
largest eigensubspace for a signal covariance matrix. This is useful, for example, in direction-of- 
arrival (DOA) estimation: the well-known work in [38] introduces ESPRIT, a parameter estimation 
algorithm that estimates the DOA of plane waves emanating from a target and being received 
by a sensor array. ESPRIT was a follow up to the MUSIC algorithm [40], and ESPRIT gains 
computational efficiency over MUSIC for a slight tradeoff in generality of sensor array design. 
Around the same time, Yang and Kavch [47] introduced an approach for subspace tracking that, 
like GROUSE, uses incremental gradient, thus making it more suitable for adaptive estimation of 
the signal subspace and covariance matrix. This work was followed by [17, 32, 46] with various 
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improvements and convergence analyses. Unlike GROUSE, these algorithms all conduct gradient 
descent in the ambient space as opposed to operating along the geodesies of the Grassmannian. 
Also unlike GROUSE, these algorithms all require fully observed vectors. 

Smith [18,19,42] thoroughly pursued conjugate gradient descent methods on the Grassmannian 
for solving the subspace tracking problem using the Rayleigh quotient as a cost function as opposed 
to the Frobcnius norm of GROUSE. In [19] the authors give a very careful definition of the problem, 
giving a nice survey comparing the applicability of various approaches. In [18] is an extensive list 
of subspace tracking references. 

We note here that none of the work in this subsection addressed issues of robustness to corrupted 
data or missing data. 

Robust Subspace Tracking The work of [31] addresses the problem of robust online subspace 
tracking. They focus on the problem where outliers are found in a fraction of vectors (that is, some 
vectors have no outliers) , though they do remark that this can be extended to handle the case where 
outliers are sparse in every vector. They have a very nice proposition relating Z°-(pseudo)norm 
minimization to the least trimmed squares estimator. 

We note here that GRASTA differs from [31] in that it directly focuses on the case where every 
vector may have outliers, it operates on the Grassmannian for greater efficiency, and it can handle 
missing data. A comparison to [31] is a subject of future investigation. 

1.3.2 Matrix Completion 

The popular Nctflix prize [1] stimulated research on the matrix completion problem: Given very 
few entries of a low-rank matrix, can one recover (or complete) the entire matrix? When Candes 
and Recht proved that, under some incoherence conditions, nuclear norm minimization recovers a 
highly incomplete low rank matrix with high probability [12], an entire area was opened up for 
further analysis and algorithmic variations. Algorithms that have been proposed to solve matrix 
completion include ADMiRA [27], OptSpace [22], Singular Value Thresholding [10], FPCA [30], 
SET [16], APGL [45], GROUSE [4], and many others. Of these, GROUSE is the only online 
matrix completion algorithm in that it proceeds incrementally, one column at a time. This along 
with the fact that each update of GROUSE has low computational complexity makes GROUSE the 
fastest of the state-of-the-art matrix completion algorithms by nearly an order of magnitude [4] . 

2 Problem Set-up 

We denote the evolving d-dimensional subspace of as St at time t. In applications of interest 
we have d <^ n. Let the columns of an n x d matrix Ut be orthonormal and span St- Tracking the 
evolving subspace St is equivalent to estimating Ut at each time step^. 

2.1 Model 

At each time step t, we assume that vt is generated by the following model: 

Vt = UtWt + st + Ct (2.1) 
^We remind the reader here that Ut is not unique for a given subspace, but the projection matrix UtU^ is unique. 
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where wt is the dx 1 weight vector, st is the n x 1 sparse outher vector whose nonzero entries may 
be arbitrarily large, and Q is the n x 1 zero-mean Gaussian white noise vector with small variance. 
We observe only a small subset of entries of vt, denoted by Qt C {1, . . . , n}. 

Conforming to the notation of GROUSE [4] , we let Uq^ denote the submatrix of Ut consisting 
of the rows indexed by Qt] also for a vector vt G W^, let f^^denote a vector in ]RI^*I whose entries 
are those of vt indexed hj Qf A critical problem raised when we only partially observe Vt is how to 
quantify the subspace error only from the incomplete and corrupted data. GROUSE [4] uses the 
natural Euclidean distance, the Z^-norm, to measure the subspace error from the subspace spanned 
by the columns of Ut to the observed vector vq^ : 

Fgrouse{S\t) = mm.\\Uii^w - Vii^Wl . (2.2) 

It was shown in [5] that this cost function gives an accurate estimate of the same cost function with 
full data (r^ = {1, . . . as long as \VLt\ is large enough^. However, if the observed data vector is 
corrupted by outliers as in Equation (2.1), an P-based best-fit to the subspace can be influenced 
arbitrarily with just one large outlier; this in turn will lead to an incorrect subspace update in the 
GROUSE algorithm, as we demonstrate in Section 4.1. 

2.2 Subspace Error Quantification by /^-Norm 

In order to quantify the subspace error robustly, we use the l^-norm as follows: 

Fgrasta{S]t) = min.\\UQ,^w - VQ^\\l . (2.3) 

w 

With UQ,^ known (or estimated, but fixed), this minimization problem is the classic least absolute 
deviations problem; Boyd [7] has a nice survey of algorithms to solve this problem and describes in 
detail a fast solver based on the technique of AD MM (Alternating Direction Method of Multipliers)^. 
More references can be found therein. 

According to [7], we can rewrite the right hand of Equation (2.3) as the equivalent constrained 
problem by introducing a sparse outlier vector s: 

min ||s||i (2-4) 
s.t. Uq^w + s - vq^. =0 

The augmented Lagrangian of this constrained minimization problem is then 

£{s,w,y) = \\s\\i +y^{UntW + s- vq,) + ^\\Uq,w + s - vq,\\1 (2.5) 

where y is the dual vector. Our unknowns are s, y, U, and w. Note that since U is constrained to a 
non-convex manifold {U'^U = I), this function is not convex (neither is Equation (2.2)). However, 
note that if U were estimated, we could solve for the triple {s,w,y) using ADMM; also if {s,w,y) 
were estimated, we could refine our estimate of U. This is the alternating approach we take with 
GRASTA. We describe the two parts in detail in Sections 3.1 and 3.2. 

^In [5] the authors show that \Q,t\ must be larger than fx{S)dlog{2d/6), where /n(<S) is a measure of incoherence 
on the subspace and 5 controls the probabihty of the result. See the paper for details, 
^http : //www. Stanford. edu/~boyd/papers/adnmi/ 
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2.3 Relation to Robust PCA and Robust Matrix Completion 



If the subspace S does not evolve over time, this problem reduces to subspace estimation, which 
can be related to Robust PCA. For a set of time samples t = 1, . . . ,T, we observe a sequence 
of incomplete corrupted data vectors , • • • , "Uf^y • Let the matrix V = [vi, . . . ,vt]- Let Vfi{-) 
denote operator which selects from each column the corresponding indices in Qi, . . . ,^t', thus 
'Pq{V) denotes our partial observation of the corrupted matrix V. Note that from our model in 
Equation (2.1), we can write y as a sum of a sparse matrix S and a low-rank matrix L = UW, 
where the orthonormal columns of U & M"^'' span S (which is stationary), and W G M''^^ holds 
the weight vectors wt as columns. 

The global version of the cost function in Equation (2.3) follows: 



The right hand of Equation (2.6) can be rewritten as the equivalent constrained problem: 



which is the same problem studied in [41], and the authors propose an efficient ADMM solver for 
this problem. Unlike the set-up of [11, 14], this problem is not convex; however it offers much more 
computationally efficient solutions. GRASTA differs from the algorithm of [41] in two major ways: 

it uses incremental gradient to minimize this cost function one column at a time for even greater 
efficiency, and it uses geodesies on the Grassmannian to compute the update of U. 

3 Grassmannian Robust Adaptive Subspace Tracking 

As we have said, GRASTA alternates between estimating the triple (s, w, y) and the subspace U. 
Here we discuss those two pieces of our algorithm. Section 3.1 describes the update of {s,w,y) 
based on an estimate Ut for the subspace variable. Section 3.2 describes the update of our subspace 
variable to Ut+i based on the estimate of {s*,w*,y*) resulting from the first step. Finally, Section 3.4 
describes our algorithm for adaptively choosing the gradient step-size. 

3.1 Update of the sparse vector, weight vector, and dual vector 

Given the current estimated subspace Ut, the partial observation vci^, and the observed entries' 
indices Oj, the optimal {s*,w*,y*) of Equation (2.4) can be found with the following minimization 
of the augmented Lagrangian. 



T 




(2.6) 



min \\VniS)\\i 
s.t. VniUW + S) = VniV) 
U eg{d,n) 



(2.7) 



{s*,w*,y*) = arg min C{Un^, s,w,y) 



(3.1) 



s,w,y 
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Equation (3.1) can be efficiently solved by ADMM [7]. That is, s, w, and the dual vector y are 
updated in an alternating fashion: 

< s'^'^^ = argmmsC{Unt,s,w^~^^,y'^) (3.2) 

Specifically, these quantities are computed as follows. In this paper we always assume that U^^Uq^ 
is invertible, which is guaranteed if \Q,t\ is large enough [5]. We have: 

= \{UlU^r'ul{p{vn, - s^) - y^) (3.3) 
5*^+1 = S^ivn.-Un.w'^+^-y^) (3.4) 

where is the elementwise soft thresholding operator [8]. We discuss this ADMM solver in 

detail as Algorithm 2 in Section 3.5. 

3.2 Subspace Update 

As we mentioned in Section 1.3, the set of all subspaccs of R"" of fixed dimension d is called 
Grassmannian, which is a compact Riemannian manifold, and is denoted by Q{d,n). Edelman, 
Arias and Smith (1998) have a comprehensive survey [18] that covers how both the Grassmannian 
geodesies and the gradient of a function defined on the Grassmannian manifold can be explicitly 
computed. 

GRASTA achieves online robust subspace tracking by performing incremental gradient descent 
on the Grassmannian step by step. That is, we first compute a gradient of the loss function, and 
then follow this gradient along a short geodesic curve on the Grassmannian. Figure 1 illustrates 
the basic idea of gradient descent along a geodesic. 

3.2.1 Augmented Lagrangian as the Loss Function 

It seems that it would be natural to use Equation (2.3) as the robust loss function. However, there 
is a critical limitation of this approach: when regarding U as the variable, this loss function is not 
differentiable everywhere. 

Here we propose to use the augmented Lagrangian as the subspace loss function once we have 
estimated {s*,'w*,y*) from the previous Uq^ and vq^ by Equation (3.2). The new loss function is 
stated as Equation (3.6): 

jC{U) = \\s*\\i + y*^{Un,w* + s* - vn,) + ^\\Un,w* + s* - vnAl (3-6) 

This new subspace loss function is differentiable. Furthermore, when the data vector is not cor- 
rupted by outliers. Equation (3.6) reduces to the P-norm loss function of GROUSE [4]. 
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Figure 1: Illustration of the gradient descent along geodesic on the Grassmannian manifold 



3.2.2 Grassmannian Geodesic Gradient Step 

In order to take a gradient step along the geodesic of the Grassmannian, according to [18], we first 
need to derive the gradient formula of the real- valued loss function Equation (3.6) C : Q{d,n) ^ M. 

From Equation (2.70) in [18], the gradient v£ can be determined from the derivative of C with 
respect to the components of U. Let is defined to be the \i}t\ columns of an n x n identity 
matrix corresponding to those indices in fi^; that is, this matrix zero-pads a vector in rI^'I to be 
length n with zeros on the complement of 0,t. The derivative of the augmented Lagrangian loss 
function C with respect to the components of U is as follows: 

^ = [Xa (y* + p{Un,w* + s*- vn,))] w*^ (3.7) 

Then the gradient is VC = {I- UU'^)j§ [18]. Here we introduce three variables F, Fi, and F2 
to simplify the gradient expression: 



Fi = y* + p{Un,w* + s* - vnj (3.8) 
F2 = UlT, (3.9) 
F = xn,ri-UT2 (3.10) 

Thus the gradient v£ can be further simplified to: 

V£ = Tw*^ (3.11) 

From Equation (3.11), it is easy to verify that v£ is rank one since F is a n x 1 vector and w* 
is the optimal d x I weight vector. Then it is trivial to compute the singular value decomposition 
of v£, which will be used for the following gradient descent step along the geodesic according to 
Equation (2.65) in [18]. The sole non-zero singular value is o" = ||F||||i(;*||, and the corresponding 
left and right singular vectors are and tSttt respectively. Then we can write the SVD of the 
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gradient explicitly by adding the orthonormal set X2, • • • , orthogonal to T as left singular vectors 
and the orthonormal set y2, ■ ■ ■ ,yd orthogonal to w* as right singular vectors as follows: 



r 

irii 



X2 



Xd 



X diag{a, 0, . . . , 0) x 



w 



J/2 



yd 



(3.12) 



Finally, following Equation (2.65) in [18], a gradient step of length 77 in the direction —VjC is 
given by 



U{r]) = U + I {cos{r]a) - 1 



\w: 



sin{rj(j)- 



w: 



(3.13) 



3.3 Remarks 



Here we point out that at each subspace update step, our approach does not remove outliers 
explicitly. In fact, we use the gradient of the augmented Lagrangian C{U) Equation (3.6) which 
exploits the dual vector y* to leverage the outlier effect. That is the key to success. Even when 
the ADMM solver 3.2 can not identify the outliers due to our current estimated subspace being far 
away from the true subspace, with the help of the dual vector y* the gradient of the augmented 
Lagrangian gives us the right direction at each step which leads us to the right subspace. 

We also must point out that since we estimate (s*, w*,y*) at each step using the ADMM solver, 
we can not recover the exact subspace with sufficient accuracy if we do not allocate enough iterations 
for the ADMM solver [7]. Fortunately, as it also emphasized in [7], only a few tens of iterations 
per subspace update step are sufficient to achieve a modest accuracy, which is often acceptable 
for practical use. Extensive experiments in Section 4 show that our algorithm is fast and always 
produces acceptable results, even when the vectors are noisy and heavily corrupted by outliers. 



3.4 Adaptive Step-size 

The question of how large a gradient step to take along the geodesic is an important issue, and 
it depends on a fundamental tradeoff between tracking rate and steady-state error. Rather than 
the constant step-size rule proposed for subspace tracking in GROUSE, here we propose to use 
the adaptive step-size rule to achieve both precise convergence for a stationary subspace and fast 
adaptation to a changing subspace. 

We use the following formula to update the step-size rjt: 

C 



where C is the predefined constant step-size scale. If we use /it 
the step-size satisfies the following properties: 

00 

lim m = and } m = 00 

t->oo ' 

t=l 

This is the classic diminishing step-size rule in stochastic gradient descent literature, and has been 
proven to guarantee convergence to a stationary point [37] [25]. 

However, our goal is not only to identify the stationary subspace precisely. We have the more 
ambitious goal of keeping track of the subspace when the subspace is slowly changing. Obviously, 



(3.14) 

t to update rjt, it is obvious that 
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with a changing subspace, if wc use a diminishing step-size rule, when r]t is shrinking to our steps 
wiU be too small to track the dynamic subspace. To continually adapt to the changing subspace, 
GROUSE [4] proposes a constant step-size which needs careful selection to balance the tradeoff 
between tracking rate and steady-state error. 

Here we propose to use an adaptive step-size rule to produce a proper step-size r]t that empiri- 
cally achieves both precise convergence for a stationary subspace and fast adaptation to a changing 
subspace. The basic idea is inspired by Plakhov [36] and Klein [23]: if two consecutive gradients 
VCt-i and VCt are in the same direction, i.e. {V>Ct_i, VA) > 0, it intuitively means that the cur- 
rent estimated Ut is relatively far away from the true subspace St- If this is the case, hcuristically 
we should take a slightly larger step along ~V Ct than the previous step-size r/t_i. Otherwise, if 
VCt-i and VCt are not in the same direction, i.e. (Vi2i_i, Vi2t) < 0, again intuitively this means 
that the current estimated Ut is relatively close the true subspace 5^, and again heuristically we 
should take a slightly smaller step along — v£( than the previous step-size rjt-i. Besides the sign 
of the two consecutive gradients giving us intuition for the step-size adaptation, the inner product 
{s/Ct-i, ^Ct) also gives us the proper adapted magnitude for our step-size [36] [23]. 

We still use Equation (3.14) to produce rjt at each time, but update nt according to the inner 
product of two consecutive gradients {VjCt-i,VJCt) as follows: 

Ut = inax{/it_i + sigmoid{—{s/Ct-i,'^Ct)),0} (3.15) 

where the sigmoid function is defined as: 

Jmax — Imin 



sigmoid{x) = /min + 



1 - ifMAx/fMIN)e-^/'' 



with sigmoid{0) = 0, /max > 0, Jmin < 0, and cu > 0. /max and Jmin are chosen to control 
how much the step-size grows or shrinks; and lo controls the shape of the sigmoid function. In this 
paper we always set /max = Ij /min = — Ij and u = 0.1. 

When the estimated subspace Ut is very close to the true subspace St, the adaptive step-size 
rjt — >■ 0, or equivalently fit — ^ +oo from Equation (3.14). Now we consider the following scenario: 
suppose we have identified the subspace precisely- and therefore fit > N for some large number 
N then suddenly the subspace changes dramatically. How quickly will this step-size rule adapt to 
the new subspace? In practical applications, taking too much time to adapt to the new subspace 
is undesirable. Specifically, only shrinking fit at most {/minI is too conservative in this scenario. 
It is easy to verify that, since at each update step ftt shrinks at most {/minI, the increase of rjt 
is limited and therefore this approach wouldn't take very large steps even though the subspace 
has changed. When the subspace changes drastically, we should shrink fit more to accelerate the 
adaptation process. 

For GRASTA, wc take this approach and call it a "Multi-Level" adaptive step-size rule. Though 
we do not provide the convergence proof here, empirically this multi-level adaptive approach demon- 
strates much faster convergence performance than the single-level strategy discussed above. We 
leave further detailed comparison to future investigation. 

Our multi-level adaptation is as follows. We only let ftt change in (fiMlN, ^^MAx), where fiMiN 
and fiMAX are prescribed constants. For the experiments in this paper we always set fiMiN = 1 
and fiMAX = 15. Then in this case Equation (3.15) is adapted to Equation (3.16): 

fit = max{/xt_i -I- sigmoid{-{s/jCt-i, ^jO,t)), Umin} (3.16) 
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We introduce a level variable It that will get smaller when our subspace estimate is far from the 
data. Then the step-size rjt is as follows: 



Once nt calculated by Equation (3.16) is larger than fiMAX, wc increase the level variable k by 
1 and set fit = /io, where £ ifJ-MlN, fJ-MAx) and is selected close to fiMlN (in our experiments 
we let jiQ = 3). If iJ:t < IJ'MIN, we decrease k by 1 and also set iit = Ho- Therefore, when our 
subspace estimate is off, we are increasing rjt exponentially instead of linearly. On one hand this 
new multi-level adaptive rule follows the basic adaptive step-size rule discussed above; on the other 
hand exploiting this multi-level property, this new approach adapts more quickly to a changing 
subspace. Once we have identified the subspace changing and nt < Umin, if the subspace really 
changes dramatically, It will keep decreasing until fit is again within the range {fJ-MiN, fJ-MAx)- 

Combining these ideas together, we state our novel adaptive step-size rule as Algorithm 3. 



3.5 Algorithms 

The discussion of Sections 3.1 to 3.4 can be summarized into our algorithm as follows. For each 
time step t, when we observe an incomplete and corrupted data vector v^^ , our algorithm will first 
estimate the optimal value {s*,w*,y*) from our current estimated subspace Ut via the minimiza- 
tion ADMM solver 3.2; then compute the gradient of the augmented Lagrangian loss function jC 
by Equation (3.11); then estimate a proper step-size rft from the two consecutive gradients V Ct-i 
and V Ct by Equation (3.15) and 3.17 ; and finally do the rank one subspace update via Equation 
(3.13). 

We state our main algorithm GRASTA (Grassmannian Robust Adaptive Subspace Tracking 
Algorithm) in Algorithm 1. GRASTA consists of two important sub-procedures: the ADMM 
solver of the least absolute derivations problem, and the computation of the adaptive step-size. We 
state the two sub-procedures as Algorithm 2 and Algorithm 3 separately. 

Unlike GROUSE, which has a closed form solution for computing the gradient, GRASTA es- 
timates {sl^wl^Ht) by the ADMM iterated Algorithm 2. Certainly we would have a potential 
performance bottleneck if Algorithm 2 takes too much time at each subspace update step. How- 
ever, we see empirically that only a few tens of iterations in Algorithm 2 at each step allows 
GRASTA to track the subspace to an acceptable accuracy. In our video experiments with Algo- 
rithm 2, we always set the maximum iteration K around 20 to balance the trade-off between the 
subspace tracking accuracy and computational performance. We make a slight modification to the 
original ADMM sovler presented in [7] : in addition to returning w* we also return the sparse vector 
s* and the dual vector y* for the further computation of the gradient v£. It is easy to verify that 
in the worst case the ADMM solver needs at most 0{\Q.\d? + Kd\9.\) flops. 

In order to produce the proper step-size rft from Algorithm 3, we need to maintain the gradient 
VCt-i from the previous time step throughout the subspace tracking process. Keeping VCt-i only 
requires additional 0(n -|- d) memory usage. The main computation of of Algorithm 3 is the inner 
product (v£t_i, v£t), which is the trace of the product V>C(_i and V>Ct, two n x d matrices, and 
will cost 0{nd^) flops. 
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Algorithm 1 Grassmannian Robust Adaptive Subspace Tracking 

Require: An nxd orthogonal matrix C/q- A sequence of corrupted vectors vt, each vector observed 
in entries C {l,...,n}. A structure OPTSl that holds parameters for ADMM. A structure 
0PTS2 that holds parameters for the adaptive step size computation. 
Return: The estimated subspace Ut at time t. 

1: for i = 0, . . . , T do 

2: Extract Ua,t from Ut: U^t = X^Ut 

3: Estimate the sparse residual s^, weight vector wl, and dual vector from the observed 
entries via Algorithm 2 using OPTSl: 

4: Compute the gradient of the augmented Lagrangian C, \/C as follows: 

ri = y* + p{UnM + 4-vn,), r2 = ulr,, r = xn*ri-t/r2 

= Twf 

5: Compute step-size % via the adaptive step-size update rule according to Algorithm 3 using 
0PTS2. 

6: Update subspace: Ut+i = Ut + {{cos{r]ta) - l)UtT^^ - sin{r]ta) tS^A 



where a = \\T\\ \\wt\ 



7: end for 



Algorithm 2 ADMM Solver for Least Absolute Deviations [7] 

Require: An x d orthogonal matrix Ufi^, a corrupted observed vector vq^ G mI^*I , and a 
structure OPTS which holds four parameters for ADMM: ADMM step-size constant p, the absolute 
tolerance e"^*, the relative tolerance e^^^ and ADMM maximum iteration K. 
Return: sparse residual s* G rI*^'!; weight vector w* G W^; dual vector y* G MI^*L 
1: Initialize s^w^y: = s*^, = w^, y^ = y^ 

(either to zero or to the final value from the last subspace update of the same data vector for 
a warm start.) 
2: Cache P = {UlUnJ-^Ul 
3: for = 1 ^ K do 

4: Update weight vector: w^~^^ = ^P{p{vu^ — s^) — y^) 

5: Update sparse residual: s'^"'"^ = S^{vq^ — Uq^w^^^ — y^) 

p+i 

6: Update dual vector: = + piUn^w'^^^ + s^^^ - vnj 

7: Calculate primal and dual residuals: r^" = \\UQ^w'''^^+s''~^^-vn^\\, r*^""' = \\pU^^{s^'^^-s'')\\ 
8: Update stopping criteria: e^" = y/\Qt\f°'^'' + e""^' T^ax{\\Un^w'^'^^\\, ||^^f^J|}, 

9: if rP" < eP" and r'^'"'' < e*^""' then 
10: Converge and break the loop. 
11: end if 
12: end for 

13: s* = s^+\ w* = y* = 
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Algorithm 3 Multi-Level Adaptive Step-size Update 



Require: Previous gradient V£(_i at time t — 1, current gradient v£t at time t. Previous step- 
size variable //j-i. Previous level variable It-i- Constant step-size scale C. Adaptive step-size 
parameters /max, /min, fJ-MAX, fJ-MiN- 

Return: Current step-size rjt, step-size variable fit, and level variable k- 

1: Update the step-size variable: fit = max{/xt_i -|- sigmoid{—{'^ Ct-i, ^ Ct)), fiMm} 
where sigmoid function is defined as: 

fMAX—fhllN 



sigmoid{x) = Jmin + (/"^•^^7-^""f^_./^ , with sigmoid{fd) = 0. 
if > flu AX then 

Increase to a higher level: lt = lt-\-\-^ and fit = fJ-o 
else if fit < fiMiN then 

Decrease to a lower level: It = It-i — ^ and fit = yWo 
else 

Keep at the current level: It = It-i 
end if 

Update the step-size: rjt = C2~'*/(l -|- fit) 



3.6 Computational Cost and Memory Usage 

Each subspace update step in GRASTA needs only simple linear algebraic computations. The total 
computational cost of each step of Algorithm 1 is 0(|ri|£i^ -|- i<rd|0| -|- nd^), where again \Q\ is the 
number of samples per vector used, d is the dimension of the subspace, n is the ambient dimension, 
and K is the number of ADMM iterations. 

Specifically, estimating {s*,Wt,y*) from Algorithm 2 costs at most 0(|r2|(i'^ -|- flops; 
computing the gradient Vi2 needs simple matrix-vector multiplication which costs 0(|0|d -|- nd) 
flops; producing the adaptive step-size costs 0{nd'^) flops; and the final update step also costs 
0(nd2) flops. 

Throughout the tracking process, GRASTA only needs 0{nd) memory elements to maintain 
the estimated low-rank orthonormal basis Ut, 0{n) elements for s* and y*, 0{d) elements for w* , 
and 0(n -|- d) for the previous step gradient ^ Ct-i in memory. 

This analysis decidedly shows that GRASTA is both computation and memory efficient. 



4 Numerical Experiments 

In the following experiments, we explore GRASTA's performance in various scenarios: subspace 

tracking, robust matrix completion, and the video surveillance application. We use relative error 

to quantify the performance of GRASTA. If the recovered data is a vector, the relative error is 

defined as follows: ^ 

\\v — vWo 

RelErr = " „ „ " (4.1) 
If the recovered data is a matrix, the relative error is defined as follows: 

IIM - Mllp 

RelErr = " , , (4.2) 
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We also use " Noise Relative Power" to quantify the additional Gaussian white noise perturbation, 
which is defined as follows: 

Nnei = ^ (4.3) 

Here v is the true data vector and C is the additional Gaussian noise as in Equation (2.1). 

In all the following experiments, we use Matlab R2010b on a Macbook Pro laptop with 2.3GHz 
Intel Core 15 CPU and 4 GB RAM. To improve the performance, we implement Algorithm 2 in 
C++ and make it as a MEX-file to be integrated into GRASTA Matlab scripts. 

4.1 Comparison with GROUSE 

Our first goal is to compare GRASTA with the non-robust algorithm GROUSE to show the need 
for a robust subspace estimation and tracking algorithm. 

4.1.1 Subspace Tracking with Sparse Outliers 

In many of the following experiments, we use this generative model to generate a series of data 
vectors: 

Vt = UtrueWt + St + Ct ■ (4.4) 

Utrue is an n X d matrix whose d columns are realizations of an i.i.d. AA(0, /„) random variable that 
are then orthornomalized. The weight vector wt \s a, d x 1 vector whose entries arc realizations of 
i.i.d. AA(0, 1) random variables, that is Gaussian distributed with mean zero and variance 1. The 
sparse vector is an n x 1 vector whose nonzero entries are Gaussian noise with the maximum of 
the data vector UtmeW as the variance; the locations of the nonzero entries are chosen uniformly 
at random without replacement. The noise (^t is an n x 1 vector whose entries are i.i.d M{Q,uj'^). 
This parameter cj^ governs the SNR with respect to the low-rank part of our data. For the entire 
comparison against GROUSE, we used a maximum of = 60 iterations of the ADMM algorithm 
per subspace iteration. 

Figure 2 illustrates the failure of GROUSE, and success of GRASTA, when these sparse outliers 
arc added only at periodic time intervals. We can see that GROUSE is significantly thrown off, 
despite the outliers occurring in an isolated vector. This illustrates clearly our motivation for adding 
robustness to the subspace tracking algorithm. 

4.1.2 Robust Matrix Completion 

We aim to complete 500 x 500 dimensional matrices of rank 5. The matrices are corrupted by dif- 
ferent fractions of outliers, depending on the experimental setting, and sampled uniformly without 
replacement with density 0.30. We generate the low-rank matrix by first generating two 500 x 5 
factors Yl and Yr with i.i.d. Gaussian entries and then adding normally distributed noise with 
variance u^. The location of sparse outliers is distributed uniformly, and the outlier values are 
normally distributed with variance equal to the maximum of the matrix. 

For each setting of the fraction of outliers, we randomly generate 5 matrices, each of which is 
solved via GROUSE and GRASTA separately. Both GROUSE and GRASTA cycle through the 
matrix columns 10 times. Table 1 shows the averaged results of a comparison between GROUSE 
and GRASTA. As expected, GRASTA vastly outperforms GROUSE across the board even with 
the smallest number of outliers. 
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# of vectors seen 

Figure 2: Subspace tracking comparison between GROUSE and GRASTA from partial in- 
formation. At time 500, 1000, . . ., and 4500, 10% observed entries are corrupted by outliers, 
and all entries are perturbed by small Gaussian noise with the variance of ixP' — 10^^. 



Fraction of Outliers 








0.01 


0.05 


0.10 


0.15 


0.20 


GROUSE 


3.17E-6 


3.95E-1 


5.04E-1 


8.27E-1 


8.79E-1 


9.35E-1 


GRASTA 


7.25E-6 


8.92E-5 


1.13E-4 


2.14E-4 


2.91E-4 


4.41E-4 



Table 1: Robust matrix completion comparison between GRASTA and GROUSE. We only 
observe 30% of the low-rank matrix which is corrupted by sparse outliers. We show the averaged 
results of 5 trials with different fractions of outliers. Here the matrix is 500 x 500, the rank is 
5, and all entries are perturbed by small Gaussian noise with the variance of = 10~^. 

4.2 Stationary Subspace Identification 

Now we wish to examine GRASTA's performance on the stationary subspace identification problem 
under various conditions. In most experiments (and unless otherwise noted) the ambient dimension 
is n = 500 and the inherent subspace dimension is d = 5. We again generate the vectors using 
Equation (4.4) above and the descriptive text that follows Equation (4.4). We vary the fraction of 
entries that are corrupted, and we vary the fraction of entries that are observed. 

We start with Figure 3, which shows subspace estimation performance under a varying fraction 
of added outliers. We can see in this problem instance that with 10% corrupted entries, the relative 
error reaches the relative noise floor after a number of iterations that is a small multiple of the 
ambient dimension. For more corruption, more vectors (gradient iterations) are needed, but even 
with 50% outliers and more, the relative error trends toward the relative noise power. 

In Figure 4, we consider GRASTA's error performance for varying sub-sampling rates. Here 
the fraction of corrupted values is fixed at 10%. We can see that again, even with a 30% sampling 
rate, the relative error quickly reaches the relative noise power. 

Now we wish to take a closer look at the case when we have both dense outlier corruption 
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0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

# of vectors seen ^^q* 



Figure 3: The performance of stationary subspace identification using full information within 
different fractions of outliers. We show the results from sparse outliers 10% to dense outliers 
70%. The ambient dimension is n = 500, and the subspace dimension is d = 5. All observed 
entries are also perturbed by small Gaussian noise with the variance of uj^ = 10^^. 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 

# of vectors seen 



Figure 4: The performance of stationary subspace identification within 10% outliers using 
partial information. We show the results with different sub-sampling ratios, from using just 
10% information to using full information. The ambient dimension is n = 500 and subspace 
dimension d — 5, and all observed entries are also perturbed by small Gaussian noise with the 
variance of = 10^^. 
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and subsampling of the signal. This is an important scenario for appUcations where the "outher 
corruption" is a signal of interest obscuring a low-rank background signal, and we wish to subsample 
in order to improve computational complexity. For example this would apply to anomaly detection 
problems or to the problem of separation of background and moving objects in video as we show 
in Section 4.5. 

Figure 5 illustrates that even when the vector is highly corrupted with 50% added outliers, 
GRASTA can identify the underlying low-rank subspace even with only 50% of the entries. We 
vary the dimension (or rank) of the underlying subspace, and because of this there is not one 
relative noise power benchmark to compare against; however we see that the trend is similar to 
those in previous figures. 




^ Q"'' I I \ \ \ \ \ \ \ \ 

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

# of vectors seen ^ 

Figure 5: The performance of subspace identification under dense error corruption. Here 
Ps = 0.5 which means 50% entries of every data vector are corrupted, and we only observe 50% 
entries of each vector. The ambient dimension is n = 500, we vary the inherent dimension d, 
and all observed entries are also perturbed by small Gaussian noise with the variance of 10~^. 
We generate 20000 vectors to show the performance over time. 



4.3 Dynamic Subspace Tracking 

The fact that GRASTA operates one vector at a time allows it to track an evolving subspace. 
In this section we show GRASTA's performance under two models of evolving subspaces. In 
these experiments, we use the same set-up as before: n = 500, d = 5, and vt is generated by 
Equation (4.4), except that Utme = U[t], i.e. the subspace we wish to estimate varies with time t: 

Vt = U[t]wt + st + Ct ■ (4.5) 

4.3.1 Rotating Subspace Tracking 

We use the following ordinary differential equation to simulate a rotating subspace: 

if = BU, U[0] = Uo (4.6) 
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where i? is a skew-symmetric matrix. Consequently, the subspace U[t] is updated via 



U[t] = e'^^Uo , 

where 5 controls the amount of rotation of with each time step t. As we see in Figure 6, for 
the rotation parameter 6 fixed at 10^^, GRASTA successfully latches on and tracks the rotating 
subspace. 




# of vectors seen 

Figure 6: The performance of tracking a rotating subspace within 10% outliers. At every 
time the subspace rotates S ~ 10^^. The noise variance is — 10^^. We show the results 
with varying sub-sampling. 



4.3.2 Sudden Subspace Change Tracking 

For this experiment, we wanted to see the behavior of GRASTA when the subspace experienced 
a sudden dramatic change. At intervals of 5000 vectors, we randomly changed the true subspace 
to a new random subspace. The results are in Figure 7. Again from these simulations we see that 
GRASTA successfully identifies the subspace change and tracks the subspace again. 

4.4 Comparison with Robust PCA 

Here we compare GRASTA with RPCA on the recovery of corrupted low-rank matrices. For RPCA 
we use [29], or the lALM (Inexact Augmented Lagrange Multiplier) method^. 

The corrupted matrices can be written as M = L + S + A^, where L are the low-rank matrices 
we want to recover, S are the sparse outlier matrices, and N are the Gaussian noise matrices 
with small variance relative to the sparse outliers. We use d = 5 matrices of size 2000 x 2000 to 
do the comparison. The low-rank matrices L are generated by the same method of the previous 
robust matrix completion experiments: as the product of two 2000 x 5 factors Yl and with i.i.d. 

*The code we used is available here: http://perception.csl.uiuc.edu/matrix-raiik/sainple_code.html. We 
downloaded it in April 2011. 
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Gaussian entries. The sparse outlier matrices S are generated by selecting a fraction of entries 
uniformly at random without replacement, whose values are set according to Gaussian distribution 
with the maximum of L as the variance^. We vary the fraction of corruptions from 10% (sparse 
outliers) to 50% (dense outliers), and we also vary the variance Gaussian noise matrices N from 
a moderate perturbation of cj^ = 10"'^ to a larger perturbation of = 10~^. For GRASTA we 
cycled through the matrix columns twice and used a maximum of X = 60 iterations of the ADMM 
algorithm; we used a maximum of 20 iterations of lALM. 

Table 2 shows the results of the comparison. We ran RPCA with full data, GRASTA with full 
data, and then GRASTA with various levels of subsampling. When there are very few outliers and 
little noise, RPCA achieves a reasonable error rate at computational speeds similar to GRASTA. 
However with an increase in noise or fraction of outliers, GRASTA achieves good error performance 
in much less time. As a particular example, when o;^ = 10~^ and the fraction of outliers is 30%, in 
69 seconds GRASTA with full data achieves better error performance than RPCA in 363 seconds, 
and GRASTA with 30% subsampling achieves better error performance in only 23 seconds. 

4.5 Realtime Video Background Tracking and Foreground Detection 

In this subsection we discuss the application of GRASTA to the prominent problem of realtime 
separation of foreground objects from the background in video surveillance. Imagine we had a 
video with only the background: When the columns of a single frame of this video are stacked 
into a single column, several frames together will lie in a low-dimensional subspace. In fact if the 
background is completely static, the subspace would be one-dimensional. That subspace can be 

^We note here that in [29], the authors use a uniform distribution for the outliers, as opposed to Gaussian. The 
authors in [11] use ±1 BernouUi variables. Gaussian is the most challenging case, because more outliers will be near 
zero and confuse the estimation. 
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GRASTA po = 1.0 


GRASTA Po = 0.5 


GRASTA Po = 0.3 


lALM 


Pa = 0.1 


uj'-' = 1 X 10^* 


1.38 E-4 / 56.62 sec 


2.03 E-4 / 30.46 sec 


2.73 E-4 / 20.51 sec 


5.80 E-5 / 35.26 sec 


uj''' = 5 X 10^* 


3.64 E-4 / 58.31 sec 


4.65 E-4 / 31.23 sec 


6.07 E-4 / 20.79 sec 


1.B7 E-3 / 93.16 sec 


uj'' = 1 X lO^-' 


7.64 E-4 / 59.55 sec 


9.59 E-4 / 31.81 sec 


1.23 E-3 / 20.66 sec 


3.64 E-3 / 117.76 sec 


Pa = 0.3 


= 1 X 10~* 


4.65 E-4 / 67.90 sec 


7.28 E-4 / 35.10 sec 


1.06 E-3 / 22.96 sec 


1.80 E-4 / 232.26 sec 


ui^ = 5 X 10~* 


6.13 E-4 / 67.19 sec 


9.08 E-4 / 34.53 sec 


1.26 E-3 / 22.63 sec 


2.64 E-3 / 324.26 sec 


w'' = 1 X lO"'' 


9.87 E-4 / 69.06 sec 


1.44 E-3 / 35.61 sec 


1.93 E-3 / 22.85 sec 


5.62 E-3 / 362.62 sec 


P3 = 0.5 


1^^ = 1 X 10~* 


1.26 E-3 / 83.11 sec 


2.05 E-3 / 39.90 sec 


3.58 E-3 / 25.33 sec 


1.43 E-1 / 341.01 sec 


u,'' = 5 X 10~* 


1.33 E-3 / 81.51 sec 


2.24 E-3 / 40.33 sec 


3.93 E-3 / 25.38 sec 


1.45 E-1 / 351.26 sec 


U1-' = 1 X 10~^ 


1.64 E-3 / 82.23 sec 


2.85 E-3 / 41.91 sec 


5.08 E-3 / 25.67 sec 


1.62 E-1 / 372.21 sec 



Table 2: Recovery of corrupted low-rank matrices; a comparison between GRASTA and 
Robust PGA. We use full information of the corrupted matrices to do robust PGA, and vary 
the sub-sampling rate po from 0.3 to 1.0 (30% of the data to full information), to perform 
GRASTA. The matrix is 2000 x 2000, rank=5. We vary the fraction of corruptions from sparse 
outliers 10% to dense outliers 50%, and also vary the Gaussian noise variance from moderate 
noise perturbation co^ = 1 x 10~^ to relative strong noise corruption = 1 x 10~^. 



estimated in order to identify and separate the foreground objects; if the background is dynamic, 
subspacc tracking is necessary. GRASTA is uniquely suited for this burgeoning application. 

Here we consider three scenarios in the video tasks, with a spectrum of challenges for subspace 
tracking. In the first we have a video with a static background and objects moving in the foreground. 
In the second, we have a video with a still background but with changing lighting. In the third, we 
simulate a panning camera to examine GRASTA's performance with a dynamic background. The 
results are summarized in Table 3. 

4.5.1 Static Background 

If the video background is known to be static or near static, we can use GRASTA to track the 
background and separate the moving foreground objects in real-time. Since the background is 

static, we use GRASTA first to identify the background, and then we use only Algorithm 2 to 
separate the foreground from the background. More precisely we do the following: 

1. Randomly select a few frames of the video to train the static low-rank subspace U. In our 
experiments, we select frames randomly from the entire video; however for real-time processing 
these frames may be chosen from initial piece of the video, as long as we can be confident that 
every pixel of the background is visible in one of the selected frames. The low-rank subspace 
U is then identified from these frames using partial information. We use 30% of the pixels, 
select 50 frames for training, and set RANK = 5 in all the following experiments. 

2. Once the video background BG has been identified as a subspace U, separating the foreground 

objects FG from each frame can be simply done using Equation (4.7), where the weight vector 
wt can be solved for via Algorithm 2, again from a small subsample of each frame's pixels. 

BG = Uwt , . 

FG = video{t) - BG ^ ' 
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Table 3 shows the real-time" video separation results. From the first experiment, we use the 
"Hall" dataset from [28] which consists of 3584 frames each with resolution 144 x 176. We let 
GRASTA cycle 5 times over the 50 training frames just from 30% random entries of each frame 
to get the stationary subspace U. Training the subspace costs 6.9 seconds. Then we perform 
background and foreground separation for all frames in a streaming fashion, and when dealing with 
each frame we only randomly observe 5% entries. The separation task is performed by Equation 
(4.7), and the separating time is 62.5 seconds, which means we achieve 57.3 FPS (frames per second) 
real-time performance. Figure 8 shows the separation quality at t = 1, 230, 1400. In order to show 
GRASTA can handle higher resolution video effectively, wc use the "Shopping Mall" [28] video with 
resolution 320 x 256 as the second experiment. Wc also do the subspace training stage with the 
same parameter settings as "Hall". We do the background and foreground separation only from 
1% entries of each frame. For "Shopping Mall" the separating time is 39.1 seconds for total 1286 
frames. Thus we achieve 32.9 FPS real-time performance. Figure 9 shows the separation quality 
at f = 1, 600, 1200. In all of these video experiments we used a maximum of X = 20 iterations of 
the ADMM algorithm per subspace update. The details of each tracking set-up are described in 
Table 4. 



Dataset 


Resolution 


Total Frames 


Training Time 


Tracking and 
Separating Time 


FPS 


Hall 


144x176 


3584 


6.9 sec 


62.5 sec 


57.3 


Shopping Mall 


320x256 


1286 


23.2 sec 


39.1 sec 


32.9 


Lobby 


144x176 


1546 


3.9 sec 


71.3 sec 


21.7 


Hall with Virtual Pan (1) 


144x88 


3584 


3.8 sec 


191.3 sec 


18.7 


Hall with Virtual Pan (2) 


144x88 


3584 


3.7 sec 


144.8 sec 


24.8 



Table 3: Real-time video background and foreground separation by GRASTA. Here we use 
three different resolution video datasets, the first two with static background and the last 
three with dynamic background. Wc train from 50 frames; in the first two experiments they 
are chosen randomly, and in the last three they are the first 50 frames. In aU experiments, the 
subspace RANK = 5. 



Dataset 


Training 


Tracking 


Separation 


Training Algorithm 


Tracking/Separation Algorithm 




Sub-Sampling 


Sub-Sampling 


Sub-Sampling 






Hall 


30% 




5% 


Full GR ASTA Alg 1 


Alg 2+Eqn 4.7 


Shopping Mall 


30% 




1% 


Full GRASTA Alg 1 


Alg 2+Eqn 4.7 


Lobby 


30% 


30% 


100% 


Full GRASTA Alg 1 


Full GRASTA Alg 1 


Hall with Virtual Pan (1) 


100% 


100% 


100% 


Full GRASTA Alg 1 


Full GRASTA Alg 1 


Hall with Virtual Pan (2) 


50% 


50% 


100% 


Full GRASTA Alg 1 


Full GRASTA Alg 1 



Table 4: Here we summarize the approach for the various video experiments. When the 
background is dynamic, we use the full GRASTA for tracking. We used K = 20 iterations of 
the ADMM algorithm for all video experiments. 



4.5.2 Dynamic Background: Changing Lighting 

Here we want to consider a problem where the lighting in the video is changing throughout. We 
use the "Lobby" dataset from [28], which has 1546 frames, each 144 x 176 pixels. In order to adjust 

^Wc comment here that to call something "real-time" processing of course will depend on one's application 
requirements and hardware (camera frame capture rate, in the example of video processing). For example, standard 
35mm film video uses 24 unique frames per second. The maximum frame rate for most CCTVs is 30 frames per 
second. 
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t = 1 i = 230 t= 1400 



Figure 8: Real-time video background and foreground separation from partial information. 
We show the separation quality at t = 1, 230, 1400. The resolution of the video is 144 x 176. The 
first row is the original video frame at each time; the middle row is the recovered background at 
each time only from 5% information; and bottom row is the foreground calculated by Equation 
(4.7). 




t = l i = 600 t= 1200 



Figure 9: Real-time video background and foreground separation from partial information. 
We show the separation quality at i = 1, 600, 1200. The resolution of the video is 320 x 256. The 
first row is the original video frame at each time; the middle row is the recovered background at 
each time only from 1% information; and bottom row is the foreground calculated by Equation 
(4.7). 
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to the lighting changes, GRASTA tracks the subspace throughout the video; that is, unhke the 
last two experiments, we run the full GRASTA Algorithm 1 for every frame. We use 30% of the 
pixels of every frame to do this update and 100% of the pixels to do the separation. Again, see the 
numerical results in Table 3. The results are illustrated in Figure 10. 




i = 180 f = 366 t = 650 t = 1000 t = 1360 



Figure 10: Real-time video background and foreground separation from partial information. 
We show the separation quality at t = 180, 366, 650, 1000, 1360. The resolution of the video 
is 144 X 176 and has a total of 1546 frames. The first row is the original video frame at each 
time; the middle row is the recovered background at each time only from 30% information; and 
bottom row is the foreground calculated by Algorithm 2 using full information. The differing 
background colors of the bottom row is simply an artifact of colormap in Matlab. 



4.5.3 Dynamic Background: Virtual Pan 

In the last experiment, we demonstrate that GRASTA can effectively track the right subspace in 
video with a dynamic background. We consider panning a "virtual camera" from left to right and 

right to left through the video to simulate a dynamic background. Periodically, the virtual camera 
pans 20 pixels. The idea of the virtual camera is illustrated cleanly with Figure 11. 

We choose "Hall" as the original dataset. The original resolution is 144 x 176, and we set the 
scope of the virtual camera to have the same height but half the width, so the resolution of the 
virtual camera is 144 x 88. We set the subspace RANK = 5. Figure 12 shows how GRASTA can 
quickly adapt to the changed background in just 25 frames when the virtual camera pans 20 pixels 
to the right at t = 101. We also let GRASTA track all the 3584 frames and do the separation 
task for all frames. When we use 100% of the pixels for the tracking and separation, the total 
computation time is 191.3 seconds, or 18.7 FPS, and adjusting to a new camera position after the 
camera pans takes 25 frames as can be seen in Figure 12. When we use 50% of the pixels for 
tracking and 100% of the pixels for separation, the total computation time is 144.8 seconds or 24.8 
FPS, and the adjustment to the new camera position takes around 50 frames. 
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Figure 11: Demonstration of panning the "virtual camera" right 20 pixels. 




t=m) If =101 If = 105 i = 110 i = 115 if = 120 * = 125 



Figure 12: Real-time dynamic background tracking and foreground separation. At time 
t — 101, the virtual camera slightly pans to right 20 pixels. We show how GRASTA quickly 
adapts to the new subspace at t = 100, 105, . . . , 125. The first row is the original video frame 
at each time; the middle row is the tracked background at each time; the bottom row is the 
separated foreground at each time. 



5 Discussion and Future Work 

In this paper we have presented a robust online subspace tracking algorithm, GRASTA. The algo- 
rithm estimates a low-rank model from noisy, corrupted, and incomplete data, even when the best 
low-rank model may be changing over time. 

Though this work presents some very successful algorithms, many questions remain. First 
and foremost, because the cost function in Equation (2.3) has the subspace variable U which 
is constrained to a non-convex manifold, the resulting optimization is non-convex. A proof of 
convergence to the global minimum of this algorithm is of great interest. 
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GRASTA uses alternating minimization, alternating first to estimate (,s,w,y) and then fixing 
this triple of variables to estimate U. Observe that if (s, w, y) are correct estimates, we could then 
estimate U without the robust cost function. This would be quite useful in situations when speed 
is of utmost importance, as the GROUSE subspace update is faster than the GRASTA subspace 
update. Of course, knowing when {s,w,y) are accurate is a very tricky business. Exploring this 
tradeoff is part of our future work. 

We have shown that one of the very promising applications of GRASTA is that of separating 
background and foreground in video surveillance. We are very interested to apply GRASTA to 
more videos with dynamic backgrounds: for example, natural background scenery which may blow 
in the wind. In doing this we will study the resulting tradc-off between the kinds of movement 
that would be captured as part of the background and the movement that would be identified as 
foreground. 
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